Collocation or Free Combination? ― Applying Machine Translation Techniques to identify collocations in Japanese

نویسندگان

  • Lis Pereira
  • Elga Strafella
  • Yuji Matsumoto
چکیده

This work presents an initial investigation on how to distinguish collocations from free combinations. The assumption is that , while free combinations can be literally translated, the overall meaning of collocations is different from the sum of the translation of its parts. Based on that, we verify whether a machine translation system can help us perform such distinction. Results show that it improves the precision compared with standard methods of collocation identification through statistical association measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Japanese Compound Words and Their Pronunciations from Web Pages and Tweets

Mining compound words and their pronunciations is essential for Japanese input method editors (IMEs). We propose to use a chunk-based dependency parser to mine new words, collocations and predicate-argument phrases from largescale Japanese Web pages and tweets. The pronunciations of the compound words are automatically rewritten by a statistical machine translation (SMT) model. Experiments on a...

متن کامل

Extracting Bilingual Collocations from Non-Aligned Parallel Corpora

This paper proposes a new method to find correspondences of uninterrupted collocations from Japanese-English bilingual corpora without sentence-to-sentence alignment. Uninterrupted collocations in English such as “once again”, “give up”, or “gross national product” handled as a single word or a compound word in Japanese, can be automatically extracted with corresponding Japanese words using wor...

متن کامل

Data Mining Meets Collocations Discovery

In this paper we discuss the problem of discovering interesting word sequences in the light of two traditions: sequential pattern mining (from data mining) and collocations discovery (from computational linguistics). Smadja (1993) defines a collocation as “a recurrent combination of words that cooccur more often than chance and that correspond to arbitrary word usages.” The notion of arbitrarin...

متن کامل

Book Reviews Syntax-Based Collocation Extraction

Collocation is a common language phenomenon which has attracted the interest of researchers in many subfields of both theoretical and computational linguistics. Although there is no commonly accepted and precise definition of this phenomenon, collocations are generally understood as complex lexical items, often characterized as unpredictable, idiosyncratic, holistic, mutually selective, and so ...

متن کامل

Collocation translation based on sentence alignment and parsing

To date, substantial efforts have been devoted to the extraction of collocations from text corpora. However, only a few works deal with the subsequent processing of results in order for these to be successfully integrated into the NLP applications that could benefit from them (e.g., machine translation). This paper presents an accurate method for identifying translation equivalents of collocati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014